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ABSTRACT 

Evaluated is the validity of the behavioral 
categories held in common among three classroom observation systems. 
The validity model employed was that reported by Campbell and Fiske 
(1959) which requires that both convergent and discriminant validity 
be deionstrated. These procedures were applied to data obtained from 
the Videotapes of 62 teacher trainees to ascertain their usefulness 
and applicability as a model for the validation of classroom 
observation systems. The validation procedures employed in this study 
were found to be an economical and useful method for examining the 
validity of all classroom observation systems. The advantages and 
limitations of the method employed are discussed. (Author) 
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Convergent and Discriminant Validation of Three Classroom 
Observation Systems: A Proposed Model 

• Gary D. Borich and David Malitz 

Numerous instruments have been developed to observe systematically class- 
room behavior. Such instruments typically consist of a number of categories of 
teacher-student behavior which an observer tallies or rates periodically as he 
watches classroom interaction* While the reliability of these systems has 
been investigated, proper evaluation of their validity has been lacking. 

The present study undertook to evaluate the validity of selected categories 
which several classroom observation instruments held in common. The validity model 
reported by Campbell and Fiske (1959) was employed which requires that both 
convergent and divergent validity be demonstrated. 

Convergent validity is a confirmation of traits (or variables or categories) 
by independent measuring methods that requires significant correlation between • 
two methods (or systems) measuring the same trait. Discriminant validity is a 
requirement that "the correlation between different measures measuring the same 
trait exceed (a) the correlations obtained between that trait and any other 
trait not having method in common and (b) the correlations between different 
traits which happen to employ the same method** . (Borich and Bauman, 1972). 
By determining intercorrelations among categories in a multitrait-mult imethod 
matrix, one can identify categories which pass specified tests of convergent 
and discriminant Validity. The procedures were applied to the following 
data in order to ascertain their usefulness and applicability as a model for 
the validation of classroom observation systems- 
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Convergent and Discriminant Validation of Three Classroom 
Observation Systems: A Proposed Model 

Gary D. Borich and David Malitz 
The University of Texas at Austin 

Evaluated the validity of the behavioral categories in common amonp 
three ciassroom observation systems. The validity model employed was that 
reported by Campbell and Fiske (1959) which requires that both convergent and 
discriminant validity be demonstrated. These procedures were applied to data 
obtained from the videotapes of 62 teacher trainees to ascertain their useful^ 
ness and applicability as a model for the validation of classroom observation 
systems. The validation procedures employed in this study were found to be 
an economical and useful method for examining the validity of all classroom 
observation systems. The advantages and limitations of the method employed are 
discussed. 
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Method 

Data were obtained from a study of 62 teacher trainees at The University 
of Texas. All but two of the trainees were female. At the end of the student 
teaching semester, a video tape was made of 20 minutes of each trainee's class- 
room interaction. Ihe video tape was observed by two judges v/ho rated the inter- 
action using the Interaction Analysis for the Study of Science Teaching, lAST 
(Hall, 1972), the Fuller Affective Interaction Record, FAIR (Fuller, 1959) and 
the Classroom Observation Record, COS (Emraer and Peck, 1973). The lAST, FAIR and COS 
systems are described in Rosenshine and Furst's chapter in the Second Handbook 
of Research on Teaching (Travers, 1973) and were chosen on the basis of commonali- 
ties in the behavior they purport to measure. 

Deocriptions of the behavior categories of the three systems were obtained 
from their coding manuals and categories grouped across systems if, from the category 
descriptions, it appeared that they measured the same behaviors. From these com- 
parisons, 12 lAST categories were paired with nine FAIR categories; four lAST 
categories were paired with two COS categories; and, across all three systems, 
seven lAST categories, five FAIR categories and four COS categories were grouped 
(there were no COS-FAIR pairings which were not included in the three-system 
grouping). The exact pairings are identified in Tables 1, 2 and 3. 

In certain cases, a single variable from one system v;as paired with several 
variables in another system. For the purposes of constructi'.tg the heterotrait- 
hetcromethod matrix, each comparison can be considered unique, even if several 
comparisons include the same variable. Thus, in the lAST vs. FAIR comparisons, 
category H consists of "lecture'' (lAST) and "lecture'* (FAIR), while category I 
consists of "review" (lAST) and "lecture" (FAIR), both categories having FAIR's 
"lecture" category in common. 
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Once the categories to be investigated had been identified, Pearson product- 
moment correlations were computed. These correlations were used to construct 
three multitrait-mullimethod matrices: lAST vs, FAIR, lAST vs, COS, and lAST vs. 
FAIR vs. COS. For each matrix, a heterotrait-heteromethod block was formed wit.h 
those values in which categories coincide but systems differ. A heterotrait- 
heteromethod block is illustrated in Fig. 1 with the first two categories of 
behavior listed in Table 1. 

For each matri: , a diagonal (called the validity diagonal) is formed through 
the heterotrait-heteromethod block by the series of cells in which categories 
coincide but systems differ. Values in the validity diagonal which are signifi- 
cantly different from zero are evidence for convergent validity. Discriminant 
validity must be assessed in two steps. First, each validity value must be coiri- 
pared with all values in its row and column in the heterotrait-heteromethod block 
to determine whether the correlation between different methods of measuring the 
same category exceeds correlations between that category and other categories not 
having method in cormiion. In a second step, the heterotrait-monoroethod triangles 
are examined to determine whether the correlation between different methods of 
measuring the same category exceeds correlations between that category and other 
categories which have method in common. This step is completed by comparing each 
category's validity * diagonal value with values in the heterotrait-monomethod 
triangles in which that category is involved. This two-step procedure was can ~d 
out for each validity diagonal value in each of the three matrices and the results 
entered in Tables 1, 2 and 3. 

Results 

For the comparisons between lAST vs. FAIR shown in Table 1, five validity 
diagonal values failed to show convergent validity by falling short of the .05 
level of significance. These f ive categories (B, G, I, K and L) also failed to 
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show discriminant validity, as they were exceeded numerous times in iheit hct t ro- 
trait-heteromethod block and in their heterotrait-irionomethod triangles. Category 
F was somewhat inconsistent. It did not show strong discriminant validity but 
did show convergent validity. The remaining cases, however, (categories A, C, D, 
E, H and J) present strong cases for both types of Validity. All of these cate- 
gories have significant (p < .05) validity diagonal values and most are signifi- 
cant at the *001 level. None of the categories wos exceeded by more than one of 
the 22 values in its row and column in the heteromethod block. Four of the cate- 
gories (A, C, E and H) were not exceeded by any heteromethod value. Categories 
C, E and H were not exceeded by any monomethod values while the other categories 
(A, D, J) were not exceeded by more than three of the 22 values. 

Overall, the picture for lAST and FAIR shows that categories C, E and H 
display excellent convergent and discrimir ant validities with highly significant 
(p < .001) validity diagonal values and perfect records in the heteromethod 
blocks and monomethod triangles. Categories A and D and, to a lesser extent, J 
present strong cases for both types of validity with significant validity 
diagonal values and good records in the heteromethod blocks c'^nd monomethod 
triangles. Category F is an ambiguous case showing some evidence for convergent 
and discriminant validity but weaker evidence for discriminant validity. Tie 
remaining categories (B, G, I, K and L) show no evidence for either type of 
validity. 

Validities appear quite poor in the comparisons of lAST with COS (Table 2) . 
Of the four comparisons, two comparisons (B and C) produced validity diagonal values 
which were nonsignificant (p ^ .05). The A and D values did, however, reach the .01 
significance level. With four comparisons there are only six values in the 
heteromet.iod block and in the monomethod triangles with which the validity diagor.al 
value is compared* Thus, if it is exceeded by any of them, this must count 
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heavily against concluding for discriminant validity. Calecories B and C arc 
clearly exceeded too many tiines to have discriminant validity and categoric*; 
A and D would also appear to be exceeded too often to have discriminant validity. 
One must conclude, therefore, that in the comparison of the lAST and COS cate- 
gories, two show convergent validity (B and C) but none display discriminant 
validity. 

In the three-way coin>arison of lAST, FAIR and COS categories (Table 3), three 
categories (E, F and H) show excellent evidence for convergent and discriminant 
validity across all three systems. All three categories have highly significant 
(p < .001) validity diagonal values in all three eomparisons. Categories E 
and F have perfect records in all three heteromethod blocks and monomethod 
triangles, while category H is exceeded only once in the heteromethod block of 
the lAST vs. FAIR comparison. Categories A and C show good evidence for validity 
across the three systems, although discriminant validity is questionable in the FAIR 
vs. COS comparison, especially for category A. None of the other categories 
(B, D, G, I) shows evidence for either kind of validity across all three systems. 

Discussion 

In the various comparisons across the three systems, a number of categories 
have been shown not to pass tests for convergent and discriminant validity. The 
failure of certain categories to demonstrate validity could have been caused by 
failure of the categories to measure the behavior they purport to measure or oy 
improperly equating categories which, in fact, are not equivalent. It is 
difficult to say from the data which of these factors was operating for any 
particular category. Hence, it is impossible to say that any category is invalid ; 
the most one can say is that it failed to demonstrate validity. It should be 
noted that in most cases, categories which faile.d to demonstrate validity failed 
to show either convergent or divergent validity. If a large number of variables 
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bad shovm convergent validity but failed to show divergent validity, one would 
suspect that strong method variance was outweighing the category (trait) 
variance. Yet, it was not high values in the heteromethod blocks or in the mono- 
method triangles which disqualified most categories; it was low, nonsignificant 
validity values which were easily exceeded by almost any other value. Some 
strong, significant values were found in the monomethod triangles (e.g., FAIR' s 
•'delves" and "initiates" had o correlation of .59, p < .001), indicating that 
a few of each system's categories are not entirely independent of one another. 
Yet, generally speaking, the monomethod values were low, so that one could con- 
clude that most categories were measuring some unique behavior. 

A number of problems were encountered in applying Campbell and Fiske's 
model to tliese data. For this study, a subset of categories was selected from 
each system because some categories in the three systems did not correspond to 
one another. Corresponding categories had to be picked out and matched up in 
order to test validity. Yet, while validity is usually thought of in terms of 
a category's use within its system as a whole, validity was actually tested 
against the subset. The nature of the test for discriminant validity (comparing 
one value with a series of other values) makes it more difficult to demonstrate 
discriminant validities when a large number of categories is being compared. 
Because each value was compared with a subset of the possible values, it was 
easier for each value to pass the discriminant validity test than it would have 
been if all system categories had been compared. This may have given some cate- 
gories the appearance of discriminant validity which they would not have in the 
context of their complete system. 

Another problem with the Campbell-Fiske method was encountered when one 
category from one system was paired with several, almost identical, categories 
in another system. When one pairs categories, one is hypothesizing that the 
two categories measure the same behavior, i.e., that they v/ill demonstrate 
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convergent validity. But, at the same time, one is hypoLhesizlnc that each of 
the paired categories differs from other categories in iLs own system and in Lh<' 
other system. In other words, a hypothesis about convergent validity necessarily 
includes a hypo^-hesis about discriminant validity. It was this second hypothesis 
which caused trouble, for when the same category appeared in two pairings, it 
appeared as two "independent" categories in its system. Obviously, when these 
two "independent" categories were correlated in the monomethod triangle, a value 
of 1.00 was obtained, precluding any demonstration of discriminant validity for 
that category. When the "Independent" categories were correlated with each of 
the categories in the other system, duplicate columns or rows appeared In the 
heteromethod blocks and the monomethod triangles. 

To circumvent these difficulties, the correlations of 1.00 in the Hiono- 
method triangles were ignored, for in the special case of duplicate categories, 
a test for the independence of these categories from each other is impossible. 
In all other respects, however, these duplicate categories were treated like 
all other categories, for each was a component of a unique pairing with another 
system's category. 

Across the three systems, the results of the study are not encouraging for 
researchers who choose to measure classroom interaction. One must infer from 
these results that,' of the 88 observational coding systems described by Simon 
and Boyer (1970), many probably do not meet the standards of convergent and 
discriminant validity that were proposed in this study. The researcher must be 
cautious in drawing relationships between research studies which use classroom 
interaction systems for which the measurement technique itself accounts for 
greater variation than the behavior being measured or when the same behaviors 
measured by different systems fail to correlate. Such findings suggest that 
the descriptive titles of categories and behavioral constructs employed by many 
observational coding systems may not adequately represent the behavior they 
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purport to monf^ure. The validation procedures employed in this study were found 
to constitute potentially an economical and useful model for cxandninj; the 
validity of other classroom observation systems. 
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A 
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A (.16)* 
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A 
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-.14 



(.58) 



(.84) 



*Interjudge reliabilities. 

Figure 1. Simplified Illustration of the Validation Model, 
The validity diagonal = .^3, -.01; the heterotrait-heteromethod 
block = .43, -.01, -.10, -.12. The laonomethod triangles = .23 and 
14, respectively. 
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Table 1. Validities of Variables 
from the lAST and FAIR Classroom Observation Systems 

N = 62 



Variable Names 
I AST/FAIR 


Validity 
Diagonal 
Value 


Convergent Validity 


Discriminant Validi 


Highest 
Value in 
Heterometliod 


No. 

Higher 


Highest 
Value in 
MonomcLhod 


No. 

Higher 


accepts feelings/values 


A 


.429 


.272 


0 


.539 


2 


q-estions student *s stmt • /delves 


B 


-.011 


.701 


16 


.595 


19 


confirms student *s stmt •/OK 


C 


.812 


.259 


0 


.306 


0 


oj en question/initiates 


D 


.825 


.701 


0 


.595 


0 


criticizes/criticizes 


E 


.904 


.299 


0 


.549 


0 


Ic oks at notes/tangential 


F 


.253 


.272 


1 


.539 


3 


n< n-f unctional behavior /woolgathering G 


.006 


.234 


19 


-.268 


18 


l€ cture/ lecture 


H 


.713 


-.250 


0 


-.308 


0 


rc /iew/lecture 


I 


-.135 


.713 


6 


.336 


4 


read aloud/lecture 


J 


.413 


. .713 


I 


.549 


1 


scbstantive closed stmt . /questions 


K 


.038 


-.317 


18 


-.268 


18 


substantive open stmt . /questions 


T 


.088 


.225 


8 


-.268 


5 
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Table 2. Validities of Variables from the lAST 
and COS Classroom Observation Systems 
N = 62 



Variable Name 
(lAST/COS) 


Value 


Convergent Validity 


Discriminant 


Validity 


Highest 
Value in 
Hetcromethod 


No. 

Higher 


Highest 
Value in 
Monomethod 


No. 

Higher 


closed questicn/convergent eval. 


.3375 


-.4979 


2 


-.2991 


0 


closed student stmt. /converg. eval. 


.1720 


-.4979 


3 


-.2991 


2 


cpen question/higher cognitive 


.2431 


.4415 


3 


.5241 


3 


cpen student stmt . /higher cognitive 


.4415 


.4979 


1 


.5241 


1 
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